Discourse Annotation Working Group Report

نویسندگان

  • Manfred Stede
  • Janyce Wiebe
  • Eva Hajicová
  • Brian Reese
  • Simone Teufel
  • Bonnie Webber
  • Theresa Wilson
چکیده

The classical “success story” of corpus annotation are the various syntax treebanks that provide structural analyses of sentences and have enabled researchers to develop a range of new and highly successful data-oriented approaches to sentence parsing. In recent years, however, a number of corpora have been constructed that provide annotations on the discourse level, i.e. information that reaches beyond the sentence boundaries. Phenomena that have been annotated include coreference links, the scope of connectives, and coherence relations. Many of these are phenomena on whose handling there is not a general agreement in the research community, and therefore the question of “recycling” corpora by other people and for other purposes is often difficult. (To some extent, this is due to the fact that discourse annotation deals “only” with surface reflections of underlying, abstract objects.) At the same time, the efforts needed for building high-quality discourse corpora are considerable, and thus one should be careful in deciding how to invest those efforts. One aspect of providing added-value with annotation projects is that of shared corpora: If a variety of annotation efforts is executed on the same primary data, the series of annotation levels can yield insights that the creators of the individual levels had not explicitly planned for. A clear case is the relationship between coherence relations and connective use: When both levels are marked individually and with independent annotation guidelines, then afterwards the correlations between coherence relations, cue usage (and possibly other factors, if annotated) can be studied systematically. This conception of multi-level annotation presupposes, of course, that the technical problems of setting annotation levels in correspondence to one another be resolved. The panel on discourse annotation is organized by Manfred Stede and Janyce Wiebe. It aims at surveying the scene of discourse corpora, exploring chances for synergy, and identifying desiderata for future corpus creation projects. In preparation for the panel, the participants have provided the following short descriptions of the various copora in whose construction they have been involved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation Of Annotation Schemes For Japanese Discourse Japanese Discourse Tagging Working Group

This paper describes standardizing discourse annotation schemes for Japanese and evaluates the reliability of these schemes. We propose three schemes, that is, utterance unit, discourse segment and discourse markers. These schemes have shown to be incrementally improved based on the experimental results, and the reliability of these schemes are estimated as "good" range.

متن کامل

The Penn Discourse Treebank 2.0 Annotation Manual

This report contains the guidelines for the annotation of discourse relations in the Penn Discourse Treebank (http://www.seas.upenn.edu/~pdtb), PDTB. Discourse relations in the PDTB are annotated in a bottom up fashion, and capture both lexically realized relations as well as implicit relations. Guidelines in this report are provided for all aspects of the annotation, including annotation expli...

متن کامل

The Annotation Scheme of the Turkish Discourse Bank and an Evaluation of Inconsistent Annotations

In this paper, we report on the annotation procedures we developed for annotating the Turkish Discourse Bank (TDB), an effort that extends the Penn Discourse Tree Bank (PDTB) annotation style by using it for annotating Turkish discourse. After a brief introduction to the TDB, we describe the annotation cycle and the annotation scheme we developed, defining which parts of the scheme are an exten...

متن کامل

Annotation of Discourse Connectives for the Prague Dependency Treebank

The paper presents a preliminary study on discourse connectives (DC) in Czech. Aiming to build a computerized language corpus capturing discourse relations in Czech, we base our observations on current foreign projects with the same purpose. In this study, first, the different methods of linguistic analysis of the discourse structure and discourse connectives are described, next, the nature and...

متن کامل

PDTB-style Discourse Annotation of Chinese Text

We describe a discourse annotation scheme for Chinese and report on the preliminary results. Our scheme, inspired by the Penn Discourse TreeBank (PDTB), adopts the lexically grounded approach; at the same time, it makes adaptations based on the linguistic and statistical characteristics of Chinese text. Annotation results show that these adaptations work well in practice. Our scheme, taken toge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007